On Arabic Texts Compression and Searching

نویسنده

  • Hassen Sallay
چکیده

AbstrAct: With the dramatic increasing of electronic Arabic content, the text compression techniques will play a major role in several domains and applications such as search engines, data archiving, searching and retrieval from huge databases. Mainly the combination of compression and indexing techniques allows the interesting possibility to work directly on the compressed textual files or databases, which results saving time and resources. The existing compression techniques and tools are generic and do not consider the specific characteristics of the Arabic language such as its derivative nature. Mainly compression techniques should be based on the morphology characteristics of the Arabic language, its grammatical characteristics, the texts subject, and their statistical characteristics. The paper surveys the state of the art of the Arabic texts compression techniques and tools and identifies some research tracks that should be explored in future. It presents also some dedicated Arabic text compression algorithms which save more physical space and speed up the data retrieval text files by searching in their compressed form.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Characteristics of Arabic Identity in Intellectual System of Hisham Kalbi based on his Books on Genealogy

Science of "Genealogy" was one of the branches of History and Historiography during the age of Jāhilīyah (age of ignorance) which has grown rapidly in the Islamic era. In this context, Hisham Kalbi (d. 204 AH. / 819 AD.), as the first author and editor of Genealogy, has a great contribution to the formation and prosperity of this science, with two important texts, the Jamharat Al-Ansab and Nasa...

متن کامل

Direct Pattern Matching on Compressed Text

We present a fast compression and decompression technique for natural language texts. The novelty is that the exact search can be done on the compressed text directly, using any known sequential pattern matching algorithm. Approximate search can also be done ee-ciently without any decoding. The compression scheme uses a semi-static word-based modeling and a Huu-man coding where the coding alpha...

متن کامل

The Reality of Arabic Fiction Translation into English: A Sociological Approach

English translations of texts associated with Arabic fiction remain largely unexplored from a sociological perspective. Drawing on Pierre Bourdieu’s sociology, this paper aims to examine the genesis of Arabic fiction translation into English as a socially situated activity. Works of Arabic fiction emerged in English translation in the early twentieth century. Since then, this intellectual field...

متن کامل

Classifying and Segmenting Classical and Modern Standard Arabic using Minimum Cross-Entropy

Text classification is the process of assigning a text or a document to various predefined classes or categories to reflect their contents. With the rapid growth of Arabic text on the Web, studies that address the problems of classification and segmentation of the Arabic language are limited compared to other languages, most of which implement word-based and feature extraction algorithms. This ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JDIM

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2010